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Introduction 



Reviews and syntheses of empirical research studies on a given topic 
are a fundamental activity In behavioral research; they usually precede 
< any major new research study^ and also are done as Independent scholarly 
works* This pap'tr reports a recent Investigation of the methods used 
fcr such reviews* The Investigation was limited to revlaws that are 
/dk.focused on juakltig Inferences about substantive Issues from empirical re^- 
^search* These will be cslled^ Integrative reylews la this report! £x- 
eluded were, reviews of theoretical positions,' of methods^ and of nonr* 
empirical research* 

Given the Importance and widespread conduct of integrative reviews^ 
one might expect that there would be a fairly well developed literature 
on DCthods^ techniques^ and procedures for conducting such reviews; but 
this Is not the case* An earlier ejuiminatloiL,by this authpt of a ;^ 
convenience sample of 39 boolts on genetaX methodology In sociological^ ^ 

' psychological^ a(\d educational researct^evBaled- there uas very. little 
ocplanation of matted oth^r than the t^' of card catalogs, indexes to ' » 
pcrlocUcals and note-taking* Only four of these books discussed hou to 

^define or sample the universe of soiirqes to be reviewed, thrc;e discussed 
criteria by which to Judge the adequacy of each study, and only two 
discussed how to synthesize validly the results of different studies* 
None of the discissions exceeded two pages In length. 

Similarly^ a prellmliKiry c^camlnatlon cf journal article titles In 
Sof :latoaical Abstracts (from January 197j through October 1?75)> 
Psychological Abstracts (from January 1973 through December 1975) and 
Current Index to, Journals in Education (from January 1973 through June 
197!^) revealed a dearth o.f work on Integrative review methods* Entries 
under the following su^:)eit headings were examined: ''literature revlews^**^ 
''methods^** ''methodology," "research methods^" and "research reviews-.*' 
Only five oi the titles out of approximately 2»050 entries appeared directly 
relevant* Upon examination^ one of the'sources proved to be Inepproprla^ 
and another could not be located* The remaining three vi^l be dlscussejj^ 
briefly la^^r In thl^ section* / 

. Adciitional evidence that there are feu explicated methods, ftecVniques 
and procedures for Integrative reviews Is the fac&^that few publls)^d 
lACegratlve reviews adequately describe ^he methods ueed.^ A prel^fmlnary 
examination of 87 rt>view articles In the 1974 and 1975 volumes ^ 
Anerican Sociological Review . S ociological Quarterly , Social Rgoblems . 
Ps ychological Bulletin ,^ and Review of Educational Research fecund only 
' twelve articles which prdvldeu nome statement on the mcthoH?^' used * 

/ 

Doing a gnod Integrative review Is rfcvcr easy* It rUght seem that 
when all or almost all of the studic*^ on the topic yicl^cxl. similar results^ 
the njs-^rk would be easy, btit this' l^incorrcct becnuec ^ careful reviewer 
' it; raill obliged to detcrtul;ic whether all the studics/liavc biases In 
tlur fL.me direction whicli caused liimtlar but luvnlid.^cRulls* In the 
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laore prevalent case where the M^ies on the topic hav6 different, and 
apparently conjtradlctory, reaiLlls, the voiQk is obviously difficult. 
A good, revlev' such research should eacplore the reasons for, the 
differences.;^ the results and detenBli)e what the body of research, 
taken as jE(*^whole, reveals and does not reveal about the topic* 

T^^ most valuable previous* wrltlhgs on integrative rev'let/ i^hods 
have been done during the last decade^ 

Kc^nneth Feldmao (1971) vrote. cha^t there Is *^.**Xlttle formal or 
systeoaflc analysis of either the methodology or the Importance of*** 
revlevlag and integjatlng*-**the ^lltetature**.*** (p^ 86)* Be suggests 
t&t^ '^half-heairted commitment In ttxii area might account In p^rt for 
the t^latlvely unimpressive degree of cumulative knowledge In many fields 
of the behavioral sciences" (p* $6). ile mentioned the problem of not 
beln^ able to know the parameterd of the universe of relevant studies* 
Feldman suggested the utility of eXi^miniQg the distributions of results 
fn^more than one manner, and suggested th^t Inconsistent results can 
sometimes be explained by differences ti^<^ubjects, trcia^tmcfnts» settings, 
and the quality of the research methods* He warned reviews should avoid ^ 
hypercritlcalness as well as hypocritlcaln^s, and Indicated that a gpod 
review of research "shows how much is kuovn in an' area, {and} also shows 
how little is Wriovn" (ft. 100). 

^ Richard Light and Paul Smith's excellent arVlcle (1971) discussed the 
present lack of systematic efforts to accumulate information from set of 
disparate studies. Light and Smith used a four category typology to 
characterize most present integrative reviews. Th^ first categoty 
comprises those reviews which merely list aoy facto^ which has shown an 
effect on a given dependent variable in at least one\ study* A second 
category comprises reviews which exclude all studies \except those which 
support one, given polht of view* The third category \s for those reviews 
which, fn one way or another, ^average the relevant statistics across a 
complete set of studies* The fourth category is compijised of vote taking "- 
counting the positive slg^Alficaftt results^ the non-significant results^ 
and the -negative significant results, ^and if a plurality of studies have - 
*one of these fi.nditigs« then that fii^dlng is declared the truths 

]^ight and Smith pointed out th^ weakness and Resulting a)nsequences, 
of these procedures, and 4>ro|iosed as a superior alternative a paradigm 
for secondary analysis of data from various studiej» which have , a comoion 
focus* The peradigji suggests the data ought ^o ^e analyzed within strata 
that take Int^ account different chdracterlstica of subjects* treatments* 
contextual variables* and interaction effects among these* Ironically, 
Light and SmltK failed to point out that such a paradigm ^uld also be 
useful for integrating results of different st^udles when secondary data 
analysis is not feasible. (Time constraints/ promises about the 
confidentiality 6f data» lost data sets/and ether fficcors sometimes* 
^cA\ preclude secondary data analysis.) / 



\ ^ 

Cene Class (1976) presented an Important paper op what he called 
'*iiteta"aTUilysl& of research/' ^£ter stating the need for "a rlgoroiis 
Alternative to the casual, narrative dlscusalons of research studies 
which typify our attempts to make sense of the rapidly expanding reaeleirch 
ll'terature," Giaas prbpoaed auch an alternative* He $uggctsted tixpreaslo^ 
Che resulta of stydlea on a ^ven topic In a comsion metric^ coding the 
various characceristlca of studies that might. bav« affected thelT 
teauXta, and then tiding multiple regression e<iuatloos or other 
statistical techniques to atudy the association of varlatlons^f those 
characteristics vith ibfi varlatlona Id the results* Thla approach 
* differs from secondary dkjita analyals In thac It does^not use th^.data 
on the Individual aubjects^ within one ot more studies, but rather tises 
data on the overall characteristics of ^ach study* 

The lack of expll^t methods for doing integrative reviews Is a 
serious problem for at least four reasons. Fltst,^ the lack of explicit 
mettods appears to be In large part the result of aocial aclentlsts 
falttog to give much thought to svch methods, and thua It probably means 
thaJt they are not ^Ing as poi9erfiil, methods as could be developed for 
accuoiulatlng social science evldenci^ Second, it makes ^ difficult to 
have standards for Judging the quallt^ 'of Integrative reviews.^ Thirds 
It ioaked* It difficult to train graduate students to do competent research 
reviews. Fourth^ the l^ck of review methods hinders the accumulation of 
valid knowledge from previous research. 

Despite the lack erf explicit mettiodology for doing Integrative 
reviews, each review Is the result of impllci:^ methods, consciously oxf 
unconsciously selected by the reviewer; \ 

This study primarily focused on the methods '^hat are currently being 
used for integrative reylews of empirical res6arcli\ in aoclology^ 
psychology and educational reaearcH. The study had four objectives: 

1» To develop a conceptual:lzatlon of the various^, 
methodological tasks of Integrative. reviews atiA 
of the alternative approaches" to each taslc; 

2. To estimate the frequency with which current 
revlewa published In high quality social 
science Journals used each of the alternative 
spproachea; — 

To evaluate critically the strengths and weaknesses T 
of the alternative approaches; ^ 

4. to suggC'est some ways In which more powerful and 
. valid Integrative reviews might be done* 
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This project investigated several sources ojf InfonnatioQ on the meth* 
ods used for integrative r<>vievs. These sources were: 

1) a purposive savfple o£ 16 intej^rative review articles that 
were suggested by various persons as being methodologically 
exemplary ( ' ' * * 

\} 

2) a random sample o£ 36 integrative review articles from 1974^ , 
* 75 and * 76 volimes of prestigious social science periodicals 
(Psychological Bulletin^ Annual Review o£ Psychology'^ Review 
of Education Research* Review o£ Research in Education^ Amert** 
can Sociological Review* Sociological Quarterly* Social ProblemsV' 
aoct Annual Review o£ Sociology)} the sample was stratifi^d^ 

as to yield 12 articles £rom each o£ the ^hree disciplines and 
equal numbers o£ articles £rom each source with a discipline; 

3) published rejoinders to the 36 rdndomly sampled articles; 

A) Cene^Cias^^s recent papers on or using meta^anslysis (1976a» 

1976b, 1977a, 1^77c, Smith & Glass, 1977) and peVsonal communi- 
cations with him; 

5) responses; to queries i^ent to editors o£ pretitigious social 
science periodicals that £requently publish integrative reviews 

^ (Psychological Bullet^^ Annual Review o£ Psychology, Ai^tiual 

Review of" Sociology, Review o£ Research in Education, and 
. Review o£ Educational Itesearch); 

6) responses to qu^ies sent -o££icials o£ national organizations 

^ ^ that were thought to have major responsibilities £ojr reviewing 

and synthesizing fesefirch in the social, biological or physical 
sciences (Consumer In£ormation Bttanch, Natioi^al Institute of 
Education; Assembly o£ Behavioral and Social Sciences, ifational 
Academy o£ Sciences; Congressional Re£erence .Service, Library 
o£ Congress; In£ormation System:^ Operation Branch, National 
Institute o£ Education; Special Studies Division, 0££ice o£^* 
Research a^d Development, Environmental Protection Agency; 
Developmental^ Neurology Branch, National Institute o£ Neuro- 
logical^ and Communicative Disorders and Stroke;.. iltiJLQEU&£ Space 
Science, National Aeronautics and Space Administrarion; Program 
AnalyVtes and Fomulation Branch, ifational Cancer Institute; 
Assembly o£ ti£e Science^;, National Academy of Sciences; Assembly 
o£ Mathematical and Physic^ Sciences^ National Academy o£ 
Sciences. * - 



, This paper will £ocus primarily on the results £rom the coding and 
analysis o£ the 36 randomly sampled articlics and on ^ brie£ critique o£ 
Classes proposed meta-analysis. .; ^ 

The {purposive sample o£ allegedly metho4X>logically exemplary review 
articles was examined primarily to aid in conceptualizing the nature of 
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review inethodology and to suggest desirable approaches for various methodo** 
logical tasks of a review. Following an examination of the allegedly 
exemplary articles^ and an examination t)f a substantial niAnber of other 
review articles^ It was decided to conceptualize the methodology of 
Integrative reviews as Involving seven basic tasks: 1) selecting the toplc(s)^ 
2) coftauXtlng previous reviews on the sama topic or similar topics^ 3) 
sampling the research studies that are to be reviewed, 4) representing the 
characteristics of the studies and their findings^ S) analy^nj^the charac- 
teristics of the studies and their findings^ J&) Interpreting the results 
and 7) reporting the review. 

These tasks are analogous to those engaged In when doing primary 
research (research that Involves collecting original data on Individual 
Subjects or cases). Indeed^ this conceptualization was based on the pre* 
sumption that reviewers and prlmaty researchers share a ^ommon gOal and 
encounter similar difficulties. The common g03l Is to make accurate 
generalizations about ^phenomena from limited Information. 

Since the methodology of primary research was used to IcOnceptuallze 
the methodology of Ince^rative fevlews^ the standards for^ competent prl** 
mary research were thought to be the appropriate evaluative criteria for 
Judging t|)e alternative methodolpglcal approaches that were Investigated 
In this, study. The problem^ of course^ was to decide what those^stan** 
dards are. Though the methodology of primary rese-^rch Is more highly 
developed than the methodology of Integrative reviews^ there are many 
aspects of It about which there Is much dlsagreetnent among social scl* 
entlsts. TheVe Is much agreement, however^ on certain topics. Sampling 
theory, as discussed^ foi Instance^ by Klsh (196S)» Is widely thought to 
provide the b«^st guidelines for samples when the purpose is to make gen- 
eralizations from a relatively small sample to a much larger population. 
There Is much agreement (and some disagreement) on the appropriateness 
of alternating descriptive and Inferential statistics that are described, 
for Instance^ by Bradley (1968), Hays (1963), and iCerllnger and Pedhazur 
(1973). And there Is substantial agreement on some of the major threats 
to tue Internal -and external validity of any study, as discussed^ for 
Instance^ by Campbell and Stanley "1(1963) and Br^cht and Glass (1968). 

A coding instrument was developed to code various aspects of^ and 
approaches to, each of these tasks. The final version of the coding 
/Instrument had 66 Items. It was used to code the 36 randomly sampled 
articles. E*ich article was coded Independently by two coders, and all 
discrepant codings were afterwards resolved by the coders. Intercoder re- 
liability and reliability over time were assessed and found quite satisfactory* 



The Results and Dlsctisslon u? Random Sample 



The Instrt^ent used to code the .random sample of Integrative review 
articles Is 15 pages long and not' appended to this paper. The numbers 
preceded by a V and enclosed in parentheses In various places throughout 
the following text refer to the coding instrument item numbers; copies of 
the Instrument or clarification of specific items are available from the 
author. 



Only point estimates are reported in the text. Koat^ but not all 
of them^ are based on 36 units of^analyais. A 0.95 confidence Interval 
fot N - 36 and X - 6 would be I<X<I2t for K » 12 tt woi\id be 7£X<19; 
and "for X - 18 it would be ]!& a<ZS, A 0.95, confidence interval Tor 
N - 20 and X - 5 would be ISX^IOI and for X - 10 it would be 5sXcl5. 



Taak 1: Selecting the Topic(s) 

Data y^te not coded on this task since it was not recognized a& an 
important p^art of doing a review until the cpding had been coiDpleted. 

Task 2; Use of Previous Reviews CVH and 121 

Just a^ reviews of past primary research are useful for preparing and 
interpreting present research on a given topic, previous reviews of a 
topic can be quite useful for preparing and interpreting present reviews 
of research literature. Judging by the frequent citation of earlier re* 
views in the sample of review articles, there seems to be rather widespread 
ai^rccmcnt on this point. Seventy-^f ivc percent of the 36 randomly sampled 
int:ti&rativc review articles cited previous reviews on the topic or on 
similar topics (Vll), but only two of these 27 provided any critique of the 
previous reviews (V12). ^ 

The uncritical acceptance and use of previous reviews is as undeslrabl 
as the uncritical^ acceptance and use of any other research. One of the 
widely recognized responsibilities of a researcher is to examine ^i:lticall^* 
all evidence used in his or her research. Important decisions about the 
focus^ methods^ and interpretations of a. cumulative review arc probably 
sometimes heavi ly. J.nf luenced (consciously or unconsciously) by examinations 
of previous reviews on the topic or similar tonics. There Is nothing in- 
correct or undef^irable about this Af the reviewer uses scholarly judgment 
in evaluating the strengths and weaknesses of the previous reviews. 

A good example of critically examining previous reviews on the 
topic ia provided by Lambert (1976). He examined a number of previous 
reviews on his topic^ sought reasons for their discrepant conclusions^ 
and then used that information to improve the procedures ot hi&. own review, 

J; ^Sam^lin^ (V16^ 19> 2Aa-c, and A5) 

The results of any integrative review will be affected li)>portantly by 
Che population of primary studies that 1^ Is focused upon and by the manner 
in which the Actually reviewed studies are selected from that population. * 
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Ooly one of Che 36 randomly selected review articles reported using ^ 
lodexes (such as PsychologicaJl Abstracts of Dissertation Abstracts) or in^ 
formation retrieval systems to locate primary or secondary studies for 
possible Inclusion In the review (Vi6) » Only three of the 36 review 
articles reported searching bibliographies of previous reviews or querying 
exports on the topic In an effort ro locate appropriate sources for their 
review (Vl?) . 

It seems reasonable to ^ssuae that these results toainly^ ref lec t re-^ 
vlcvers' failure to report how they searched for sources, rattier than a 
failure to. use. the Indicated means for the search* It is almost iQCon<* 
celvable that most reviewers do not use Indexes or bibliographies* 

failure of aJmofit all .Integrative review articles' to give Informatjon In- 
(llc:atlng^ the thoroughne£»B of the search for appropriate prlA^ry sources 
does* however* suflgeat that neither^ the »rcvlcwcrs nor their editors ;it" 
rach a great deal of Imporrance to sueh thoroughness* / 

The question of^rhether a set of located studies on a topic 

oup,ht to^bc considered a sampled a population is a difficult onej 

t * But in either case It 1$ highly 'desirable to locate as nany 

of the existing studies on the to^lc as is possible* Since there Is no 
way of ascertaining whether. the set of located studies Is representative 
of the full set of exls ting <;tudies on tJie toplc^ the best protection 
against an unrepresenta^tlve set is to, lo rate hs inany of the existing 
studies as Is possible* Then if the number of located studies is greater 
than can be carefully reviewed) a samplejof those studies. c^n be used in 
the review* ^ 

Data were collected on the extent to which the reviewer discussed or 
analyzed the full set of l ocated studies on the topic* One of the 36 ran-* 
domly sample revl<^ws dlscTI^sed or analyzed the full set of located scudies* 
6 of the reviews clearly did not, and for thi 29 oth<^r revie«3i the Jlnfor- 
v^tlon_£iycD In the published article was Insufficient for making a Judg^ 
nent on this matter (V^5); 

Data were not collected on/ how reviewers selected studies for analy-* 
sis or discussion from the locked studies* ^Jhe^coders* Icipression, how- 
ti T iji » f ium I ' caJ ftiE t^l<^^^evti^ws^'^ IS that subsets were usually purposive 
samples of ^'methodologically ade<(uate studies'^ or of '^representative'* 
studies* For Instance^ Glass (1976a) analyzed only those studies that had 
a coijtrol ^roup* Sechrest (19^6) Indicated^ *'An annual review^ even In 
an area so clrcujn<;crf bed AS personality^ cannot serve as a substitute 
for P sychoJoRleal Abstracts * Mo pretense^ or breadth 6r depth of 
coverage made here* The materials cited were chosen because they 
fit ^ topJLc or Illustrate a point* to be nt^-Kle*' (p* 9). And Denierath and 
Roof^ (1976) lndlc^ted*'"It Is manifestly Impossible to summarize che 
entire recent literature* Instead^ wc have highlighted empirical srtudle^ 
that mark significant conct>pt<x^il and/or methodological advances** 
(pp* 19*20)* . ^ " 

For the purpore of making generalization^^ some sort of randota sample 
(simple^ strntltlcj, r:ultiMi;^e» etc.) Is tho rryjsc appropriate* Such 
sample? do not 'i?;iiirt: a re;;>re<;encatlvt s.inple, byt neither cocs any otV<:z 
approach* ajtd randan sas.plca h.ivr? rh^ fidv3nt*ifte of rillouin^ ^n csri^^ato oi 
the probtiblUty of dr;ivxng ^ slf^ni i Ic.kuI/ unrepresentative s*:ioplo* 
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It is possible for a review to analyze more than one subset of located 
studies on the topic* Sometimes there may be justification for Including 
an examination of a purposlvely sampled subset that Is exemplary In soi&e 
maoner^ but since the definition of an Integrative review used In this 
study is limited to those In which generalizations arc sought^ there does 
not seem to be any Juittlf Icatlon for using a purposive sample except In 
conjunction with a random one* 



Task A: Representing Characteristics of the Primary Studies 
(V53a*e and 60 ) 

The representation of the characteristics of the primary studies is^ 
In effect^ the data collection of Integrative reviews* The manner In 
which this Is done can substantially affect- the results and Interpretation 
of the (tunulatlve review* 

TWentyelght of the 36 randomly sampled reviews either reported the 
findings of many of the Individual reviewed studies or Indicated how aiany 
or fjhat percentage of the studies had each rype of finding or result 
(VAAa)* Of the 26* eighteen represented at least one finding of the pri- 
mary or secondary research with an Indication of the direction and magni* 
tude of the difference or of the association (by standard score difference 
measures, or rp, r^, T, R, W, K , rp , R , etc*) (V53a)* Only A of the 
26 reviews made at least one clear distinction atnong: significant posl* 
tlve fir.. tngSt non-*slgnlf leant positive findings^ non^slgnlf leant 
negative findings* and significant negative findings (VS3b)* Only 
one of the reviews clearly represented the findings of the primary studies 
in any of the other investigated ways (V53c-e)* For 10 of the 28 studies* 
there was insufficient Infonaatlon for Judging how the reviewer represent* 
ed any of the findings of the primary studies* *It should tite noted that 
information on items V53a-*e was coded as Yes if there was any instance in 
the review that reflected the item^ -Consequently * it was quite possible ^ 
for a review to be coded as having represented a finding of a primary study 
^with a magnitude measuret and yet foV there to have been no clear indlca* 
tlon of how the reviewer represented roost of the findings* It was also 
possible and fairly comnkon for the reviewer to report the findings of many 
individual stMdles^v t>ut in manner ^«wi>^^ftT it was "Impossible to judge 
how the reviewer had represented most of the findings* Tne impression of 
this writer is that such ambiguities were present in about SO percent of 
the review articles* It was conizion to find reports that '^Johnson found a 
relation between X and but Alexander and Henderson did noc**^ It was 
often impossible to know whether reported relatlcnH were siatlstlcaliy 
significant or Included those that were ^substantially*' different £rom 
zero but not statistically significant* It also was common for ilndlngs 
of primary studies to be reported as statistically significant with no 
explicit indication of their direction* 

Every reviewer has to represent the findings of the primary studies 
in some manner^ and though items V53a*e oa^ not extiaust the ways in which 
this can be done'i they almost certainly Include the ways oost commonly 
uAed* The alternatives are T^enttoned above in oruer of 



decreasing amount of information they provide^ they descend ftora an inter** 
vaI Dkeasure^ to ordinal measures^ to a nomioal meaaure. Of th€ alterna- 
tives, the magnitude measured vlth a directioaal si&n (V53a) are clearly 
the preferred wa/ of representing th« findings of the primary studies. 
To analyze these it is necessary to reduce them to a comaon metric^ a 
ctore that is not al^^ays easy, but one on which some development work is 
currently being done (01;ish» 1977a* 1977c )- The next best alternative 
is representing, the findings as significant (4), non-significant (4-1 , 
Mro, non-significant, and sigalfic^t (-) (V53b). Thitf alternative 
may be best If magnitude measures or the data needed to calculate them 
are not reported in many of the pbTimary studies* but it is quite inferior 
to the first approach, as will be <liscussed in the next subsection. The 
vorst of the alternatives indicated in the coding instrument generally is 
the lii8t.(VS3e), where the findings are represented as a significant dif- 
ference in one given direction or not so. This alternative should usually 
be avoided^ for it produces Ambiguous data* unless mo^t of the printary 
studies beifig reviewed used one-tailed tests of their hypotheses. For 
instance^ if 12 out ot 26 findings Ore significantly positive, is this 
good evidence of a positive relation for the studied phenomena? It largely 
depends on how ciany of the remaining 14 studies had significantly negative 
findings which cannot be determined from this representation. 

No data vere collected on how the 'reviewer represented the independent 
variables of the primary studies. A preliminary examination had shown that 
review articles hardly ever indicate this. The reptesentation of the in- 
dependent Vdi cables of primary studies can have a major impact on the re* 
suits of a review. 

Only one of the 36 randomly sampled revl^ atticles indicaicecf t^^T CK^^ ~ 
reviewer* when encountering reports of primary and secondary studies that 
did not have all the Information needed for the analyses* sought tQ get 
the information from the authors of the reports of- those studies or from 
detailed final reports of funded research, or calculated or estimated the 
information from- the other information given in the initially reviewed re- 
port of the Bt\jdy (V60). It is just about inconceivable^ that 35 of the^6 
reviews did not encountet problems with missing Information. (That cannot 
be determined from this study is whether the failure of review articles to 
report efforts to get such information reflects an omission In the reports 
or An omission of efforts to get the information. This writer suspects 
that it is some of each, but predominantly the latter. 

In primary research today it is quite common for the investigators to 
make rather ext^enslve efforts to minimize missing data and to report those 
efforts briefly. It would appear that similar efforts and reporting pro- 
cedures are dually desirable for reviews. 

> / 

Task 5: Analyzing the Pritaary Studies (V^l^ 56a-Ct 57a->e^ and 62) 

Analysis is the process by which the reviewer makes Inferences from 
th^ primary studies. It Includes: judg^nts about the l::rplications of 
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Ideoclfled aecho^^oglc^l screngchn or weakness^*; In the prlia^ry acudlc^a, 
catiAaceB o£ poj^laclon _pdraiacccr£ of the studied pttcnomena^ and tissess*' 
jMocft of hov varying charac ccrlsc Ics of subjects^ coDcenc» and creacocncs 
or siiftp«cced caufidl variables oay affect the phenooetta. Twenty-six o£ the 
36 reviewers described whdc were considered co be the laajor methodological 
^fflculclcs or shortcomings of thfi primary research Uiac was reviewed 
(V41). Some of the other 10 reviewers may have examined these difficulties 
or abortcomlngs but failed to report on tiiem. 

If more than a small portion of the reviewed studies have serioU4> 
methodological weaknesses^ these' limitations can ^omccitzjcs lead to In* 
valid Inferences unless their effects are considered before graving 
inferences about the topic. No data were collected on how Identified 
weaknesses In tt^ acrhods of the primary studies were taken Into account 
irtien making Inferences frotn those studies* The Impression of this writer 
Is that the most comoh approach was to Indicate that Inferences about the 
topU were unreliable If dany ejuea^nesses were found* the second most 
cotaDion approach appeared to be to discard the methodologically "Inadequate*' 
studies and base the Inferences on the remaining ones* A third approach 
that appeared :o be used in ^t least a few reviews was to identify weak- 
nesses In the research which supported one point of view and thus discredit 
the evidence for that point of view^ without applying" the same standards of 
methodological adt^quacy to the research which supported another point of 
view* AU three strategies raif>c the question of what constitutes a seri- 
ous threat to the validity of a given study and ujiat docs not* There is 
no itietplc au^^er* if a rjodest ntsnber of the studies are devoid of such 
thrii^ats^ t^e impact of the threats in the other studies ean be exaulned 
cmplrlffally in a crmnor that vlU be discussed larer in thiu paper* 



H &houM be noted that the actual threat to the internal and extemai 
valHUy of a ^tudy ii^ not deternined exclusively or ever: prirsariiy by- the 
design of the study* Ca^^pbeil and Stanley's iciportant and widely read 
oonogr^iph (J963) or\ expcr4*:Ar^tal and quasl*exPeri(Jientai designs shows which 
thrc<it!4 are, controiled by various different designs* But the oonograp>i 'docs 
not indicate wtUch threat?? arc likely to be trivial in a given study nor 
which threats cm he reasonably controlled by other raeans* For Instance^ 
lostruT^tnt decay nay be a serious threat to Internal validity when using a 
ratcr^s judgir^ent of people's emotional healths but Is unlikely to be a sef* 
lou?; threat when measuring children's height using the kind of device that 
U coEsnon In physician*;' offices* SiffiU'arlyt obtrusive nseasutes of a vari- 
able pose a nore serious threat of testing effects than do unobtrusive 
Dcasures. And ^ludits where the data ^re collected over a brief period of 
tloc are Iciis, likely to h*ivc thejr validity threatened by history and u)at^^* 
ration thin are studies where the data collection extends over a longer, 
period of tliue* * ** ^ . 

It In the inpret^fiion of this writer that stfee reviewers will la^bel, - 
nethotI.Tjogic3i ly inadequate any s:udie\; which ici not hive experi-RCnt'^l 
or flsron^i qiMsi-experinontai dcvigni. Sonietl-T^es this is appropriate, hut 
th« above dlscuuii loTEi ou§ht to indicate that it is not always appropriate* 
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Ofztnt but not always, vhen there is a sizable Dumber of studies oa 
m glvea ftoclal science topic, there are boom results which appear inconV 
gri^nt with the other results. There are a aixnber of possible reasons lor 
vi^iag results ,io a sec of studies on a given ^top^i * One of the^e is 
raodoa saQpling error* Sainpling theory indicates that, uhea cHere is a . 
set of studies froQ a given population, the findings will vary sot&e* 
About half of the study flhdiags will be greater than the population para- 
M^cr and about half vlll be less than the pop'u^tion paraiceter. In addi- 
tion, if each study's findings are tested for .statistical significance at 
th« level vith the null hypQChesis being, the true population parameter^ 
about 2*5 percent will have findings statistically significantly greater thAn 
the> popul*>tion parameter ; an^^ about 2*S percent will hiVe Hndings statistically 
significant Jy less than the po pulation paramater * This samplinit error has to be 
takcn-ioto account, when judglitg vhether or not variations in the findings should 
be considered congruent* . ' . 

There is strong evidence Hiat some of tKe reviews failed to tatce this 
Murce of Variation into account* Barnes & Clau&on (1975, p* .651) reported, 
''ThD effwMry cf advanced organizers hiTs not been established* Of the 32 
Btudics rr .cved« i2 renorte^d that advance organizers facilitate learning 
and 20 rep^ rt^d that they did not*^* bit an exaainatlon of th^ evidence in^ 
dicate!^ tiut the iZ Studies yielded statistically significant positive fiod*- 
tngs, atid the other 20 cosiprise qt ^ies wnioh yielded j^n-^sigQif leant (^) 
findings, 2ero .difference, n ?^igaif ica[^s^(-*) finding^ and perhaps signif^^ 
leant (*) findingr>* games a. !aws6n did not report how many* of the 20 
etudies yieltipd e<vch type of finumg* If the population value was zero, 
and all the 3^! studies tested their hypotheses at the *P5 leveli th^n it 
is expected that biitveen zero and tvo of the studies ^•^OS * 1/2 * 32 * 18) 
would have'statistically significant (+) findings rather thap the 12 that. ^ 
actuilly did* Unless several of the 20 studies had statistpcally signifi- 
cant (-) findings^ Barnes and Clawson^s data stroji^ly suggest that advanced 
otganlzets have at least a small positive effect on learning* If there are 
considerably acre than the i^xpected nwber of both significant (+) and sig* 
ntftcant (-) findings, it is po^ible that the population has a bimodal 
distribution* or that the examined studies were of two or cnore populations, 
despite appearances to the contrary* * v 

Another example of reviewers falling to take eanpling errors into 
account vhen aaklng inferences froai a set of studies is the review by * 
SchuUz and Sherman (1976)* TWenty-^two of the 62 studies cited in this 
review had significant (-(*) findings; no information is provided on the * 
nuttber of significant (-) findings* Schultz and Shenaan wrorU; 

Tlie Diany nonsupportlve studies, the quallf tcatlons to^ 
.some ot Che si^portlve stiidles, ahd in particular, the 
consistent failure to replicate Interactions between 
social class and- reinforcers lead us to several con^ 
elusions* 1) Social class differences in reirrforcer 
preferences can no t_ bne? , assigned* (p*^39) 

If the population value were zero, 62 «itudies tested at the *05 level 
would be expected to yield between zero a:*d seven significant _{^) iindinf^s 
rather than the 22 that d^ld occur* Th;*re is, however* a factor sugge'^ted 
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by Schultt and Sh^n^n that does* complicate ^he interpretation* They 

claim that ttethodoXogically superior replicates of earlier s^dids\, "^^.^f 

f^t had foupd significant <+) findings ofteiy failed to yield such Sv^ 

''f^odlngs* This does'lralse a legitimate concern, ^ut there are some 
questions as to its inrglic^tions* Firsts Schultz and^Shen&dh Indicated 
that only three of the 22 studies with sigoific^t findings , were 
unsuccessfully replicated (by a total of ^ studies). Second^ Schi^tz 
and Sherman dl4 not indicate whether th^se failures to replicate * V 

. yielded noo-signif icant <+) findings^ non-significant (-) findings 
or significant <-) findings, land such information is important in . ^ ^ . 
interpreting the findings* f^i^d^ investigacors^vho^ conduct a replica* 
tion may be predisposed to disprove the original study, and thes^ 
predispositions may create some- biases in their investigations despite 
some real* methodological improvements* * 

Tliere were a n\sDbcr of other reviews examined in chis study that 
also dis^egard^ the distribu;tion df the findings o£ the reviewed * 
studies^ but these reviews ^i^Q suggested a number, o| reasons why the 
bulk of their' reviewed st^ud^ieptmight be invalid and thu^^ provided 
some justification t^v the omkssion* ' ^ ^ 

Some other revieirs tiad bq few studies on ^the 
topic that It would be impossjlble to' infer'reliably whether all but the 
most skewed distributions cou^d not reasonably be expected to come from 
a population where theretwas '*no dif ference*'* In *thc. random ^aqple of 
/ 36 examined reviews^ however^ there were at least 18 studies which .did not 
provide adeifUite infonnation tor judging whether or not the reviewer had ^ 
interpreted variations in the, findings of the prliDary studies in light o£ 
expectecT sampling error (V4ia and 53a*c). ■ ■ , ^ 

Care has to be exer^isediwhen"" analyzing the distribution of l^indjLngs 
among the four categories- of Vstatistically significant "non-significant 

"non-significant (-) and '^signlf icant One complication Is* 

chat the above discussion has to be modified unless the null hypotheses 
tested ifi each primary study were ones of '*no difference" or "no reLation- 
8hip«" Null hypotheses usually are stated as such, but occasionailly they 
are hot* A second complication is that the above, discussion presumes 
that all the tests of hypotheses were cvo^tailed tests. A third 
complication is that tt^e above discussion presmes that ail the primary 
studies tested their hypotheses at the same levei* of Type I errors. A* * 
fourth complication is that che above indicated method of analysis does 
not provide Information on the magnitude of the differences or relation* 
ships. If the sanple si2cs cjt many of the primary studires are quite 
large (say greater than 500); the method would lead to the conclusion 
that there is a difference even if 1^he population parameter is only 
trivially greater or iess than zero. This conclusion vou^d not be incorrect 
but it would be unimportant. ^ ^ ^ 

It is also possible to analyze the distribution of findings among the^^ 
two categories of "positive" and "negative." This can be done by using 
the binomial distribution^ or an approximacion of it. This method is ^ 
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subject to all the above^rontlcfnetl complications except-^he third -one-^- — - 
It can yield trivial conclusions either if many of the N*d of the. primary 
studies are quite large or if k quite large number of findings is. 
analysed* ' ' ^* 

In addition to the sampling error, there are at least three othtifT^ 
causes of variations .|n the findings of a set of /prirtary studies on a 
topic* These include: the studies in tKe sec e)faiiiined different 
phenomena, the^studles in theiseL examined che same phenomena under 
differing circumstatfces vhich!affected the findings,' or^ the methods of 
the studies varied af\d affected their findings* These reasons can be 
tested by examining the relationships of the Varying characteristics of 
the studies to the. varying results* Noiie of the 36 randomly sampled 
reviews did such an analysis in a multivariate icaoner where two or more 
of the varying characj:eristiC8 of the primary and secondary^ studies were 
simultaneously testM for relationships with the varying results (V56a)* 
tWo- of the 36 Reviews did univai iate analysi'-S, examining the ^relationship 
of a slTigle characteristic of the studies to the varying results (V$6b)* 
Another five -of The 36 reviews made sttch analyses in a systematic 
discursive manndV^ whet'eby-they discussed how one or more characteristics 
o£ the i^ritoary and secondary studies were related to differences In the 
findingd^ across the |uil.set of analyzed studies (V56c)* A total of only 
7 out of';'the 36 articles reported analyses by any of the above three means* 

This is not because ^he other 29 review? did not have discrepant 
results* As yas reported earller^^ 32 of the 36 reviews had at least some 
incongruent findings (V24a)* Perhaps the 29 reviewers did such analy^es^ 
but did not find statistically signifibant; results, then chose not to report 
the results; or perhaps they simply failed to do ..such analysis* 

tt should be^ looted thac the rc^vicys were not coded as u$ing systematic 
discursive analysis (^6c) unless th^y discussed how a characteristic of 
the study related ta differences in ti;e findings across all or .ntpst all 
st udie s in the analyzed set* The impressiPn of this wftter is thac mosc 
of the ^cviffcvs did suggest some explanation for the ob served differences 
in the findings and many offered some evidence for the explanation, but 
that eviilence vas usually leas systematic than coded in V56a^c* For 
instance, a revleyer i&ight point cut that the study that had the 
highest Y alsp had a. higher X than the studj^ that had the lowest Y, 
while not mentionrTng the relation betweeO X and Y in the other studies 
on 'the topic* 

It is not at all clear why systematic a^£$^ses of the correlates 
of varying findings are not done mere often in integrative reviews* 
Perhaps it is bec£use the reviewers find so many differences among 
^studies that they despair of being able to find systematic- relations* 
Or perhaps it is because the reviewers have simply not thought to ^o such 
'analyaes'* ^ , . ' 

* 

Whatever the reasons, the effect of this omission Is obvious and 
sertous* Without^^uch analyses reviewers will sometimes incorrectly 
infer that the fln9in$s ol a reviewed set of studies are' conttardictory 
and that the available evidence is inconclusive* It seems almost certain 

r 
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that sense of the confusion that surrounds m^ny topics' in the social 
sciences is partly a result of reviewers* £r«^quent failure^ to. search . ^ 
ffystematically for explanations of the varying results. Multicolinenrity^ 
or weak correlations will sometimes preclude explanations of the 
variations* but the search ought to be conftucted despite such possibilities. 

Sooetioes when doing a review* one or two primary studies may be 
located that o£fer unusual potential for shedding light on some 
Important issue* If only their original data coiild be reanalyzed* In 
such cases a secondary analysis Is IppifoprJ,ate* Only one of the 36 
randomly samp-Ied xeviet.s reported having ^ne a secondary analysis 
(V62) « It seems unlikely that such analyses would be done and not 
reported* but it is not at ail clear'how many times there was Justifica*^ 
tion for doing Such analyses* 

Sometimes secondary datd analysis ran begone with a minimum of 
resources* but sometimes it cannot* Dara sets are somerimes lost or 
Inadequately documented; in addition* promises of conf identlallcy 
sometimes make it impossible to release unaggregated data^ 

It should be noted that close congruence among the findings of a 
set of studies on a given topic does notynecessarily indicate that the 
evidence is valid^ and rhe leek of clos^ congruence among the findings 
does not nec;sarily indicate that the evidence is inconclusive; For the 
purposes of this discussion* the findings of a set of studies, on a given 
topic will be considered congruent if they do not v^ry more t^han eouXd be 
eXjf^ected by chance from random sampling error* It is possible fpr the 
findings* to be Congruent* bUt to be invalid. This could occur if ail rhe 
findings were biased by one or more methodological flaws thar were common 
to ail of the studies* or if all the findings had the same net bias bur 
caused by different methodological flaws in different studies* '^ae latter 
ts possible* but not particularly likely. 

^ 4 

It :6hould also be noted thar one or more methodological ^flaws in a study* 
even wh^n serious ones* need not cause biased findings* . They only create 
a threat to validity which uay or may not cause a bias. \ 

. ■ • - / ■ 

If the findings are incongruent it is still possible for all of them 
to be valid. This might be so when the, varying measures of the outcome 
variable represent several somevha6 different construcrs or when subject 
characteristics* scope conditions or conrextual variables vary'among the 
atudies and affect the outcome variable* For ^nsrance* the relationship 
between X and Y may var>* in different regions of the country* oyer 
different age groups of sub^ects^ or under different social* economic , ^ 
or political circumstances* It ^he different studies varied in respect 
to these factors* their results mighr vary subsrantially. and yet all 
might be perfecrly valid. Tliis is why it is imporrant in int^lgrative 
reviews* when the findings are not congruent* to search for and examine 
factors which may sysrematically co*^vary with, the findings. 
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Gl4B8*a Meta-analysis 

Glass'ii iDeta-analycic approach involves transfo^lng the findiiigs of 
Indlvlduar studies to som^ coimoQ metric^ coding various characteristics 

of the studies t ai^d then using conventional statistic^ procedures to 
determine whether there is an overall effect^ subsampjle elfects, and 
relations among characteristics of the studies and the findiiigs* The 
original data for each unit of analysis in a study are pg t vs^ed. Rftther^ 
the unit of a|ialysl& is the study, and summary data from escfa study are 
analyzed. Fpr ;lascan«±e» if the re ^ is a set of experimental studies which 
investigate^ the impact of X on for each study one might code the ^ \ 
av€M:age 4$<£. ^nd SES of subjects* the duration of treatment ^fX)^ the 
sr^tin^ in which the treatment was applied* an estimate of the reactivity 
or fakeabllity of the outcome meatfure tised^ an estimate of xhe internal 
vali^^ty of the research design, and the date When the study was Conducted. 
Th&v( these variables would be used" an univariate orjiuljtiya^riate mann^.^ _ 
to /piredict a standardized measure of the'^Hndings * 
when moat of the studies arc experiments with a control group^ the 
-standardized meagre of the findings be a stsndard score difference 
measure calculated by the meaq difference of; the experimental and control 
group divided by' the within group standard deviation of the control group 
(Glass^ X977c» p. 39). He suggests that if most of the studies are 
correlational, and use different neasures of association^ the standardized 
measure be a product^ontent correlation; h^ provides formulas for 
estimating product^moment correlations from various other measures of 
association such as the point^biserial correlation, Speaniian*6 rank'prder 
correlation^ Kann-Whitney U, as yell as t and F XGlass^ l977a» pp. 4^10). 

Itie *meta-analf tic approach ha^ a number of strengths. Firsts it JLs a 
systematica clearly articulated, and ^replicable approach to integrating 
results from a set of studies* Second^ it can be used with information* 
from bath the best and the less-thaA-b^est studies on a topic, but with 
controls for posaible biases caused by various fJLaws in the available, 
studies. Third, it can provide estimates of ^he population parameters. 
Fourth* yhen using multivariate statistical proce<|ure» it provides a 
method- for simultaneously investigating the relationships of variations 
among studies In respe.ct to their population of subject^, their, scope 
condltiou^t the intensity and duration of their treatments^ and other 
factors^ with variations in the findings. No approach coiunoaly used to 
date, for doing analyses in integrative reviews has been capable of 
doing tTblTs. 

Glass 4ia8 indicated some difficulties and unresolved questions about 
the application of his approach. He has pointed outc 1) it is sometimes 
difficult to get a standardised measure of the finding from -a study 
because of insufficient data^ 2) variance scabilization transformations 
may be desirable for measfures of the finding where distribution of the 
criterion variable is attenuated^ 3) the notmal distriblition a63^mption 
when transforming dichot^mous data via proMt transformations needs to be 
examined^ 4) there is a problem of how to Analyze results that are nested 
within variables analyzed in a study^ 5)'^ findings perhaps should be 
weighted by their sample t^ize, and 6) there arc problems of analyzing 
aggregate data (tlie study . a.s the unit of analy^l^) when 4:rying to make 
inferences about unaggregatcd phenomena* 
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There ate some other limitations ai\d problecis In the application o£ 
this apptdach which have not yet been <discpa&ed In ptiblished £onn» and 
which wlli be mentioned below* tt should be noted that aost of these 
difficulties are compdo to all analytic approaches to tbtegratlve xevlevs* 
Kevettheless^ they are lopottant to keep -in mind when doing or Int^rpretr 
Ing meta analyses* 

One limitation of the meta-analytic approach is that It. can assess 
only relatively dljrect evidence on a given topic*' Sometimes a topic of 
Importai^ce has not been directly investigated but there are studies with 
Indirect evidence thaO can he reviewed and woven together. For instancirt. 
If the topic Is 'Hfl 11 substance X reduce chronic depression in adults?**, 
the re may not have been any studies 'on that question, but there may have 
been studies of the effects^ of X on depression In baboons and studies of 
the similarity of effects of other chemicals on depression In baboons and 

hunans* The meta^analyiric approach cam be used for evaIua^iag-th£_jre&ul-tS' 

vltHin each set 6t studies^ but It cannot weave together the evidence across 
Sets of studies on related topics* 

/ A second llmltatloiv of the meta^aaalyi:lc approach Is that It cannot 
be used to Infer which characteristics of studies on a given topic caused 
the differing results* Statistical analyses can provide good evidence of 
causal relations only when the data are from experiments or atrong <|uasl- 
experlcDents* The characteristlcSf'of reviewed primary studies are not 
sjrstematl^lly ciaaipulaned In an/e^eriment or quasl'-experlment, even whea 
^l^tTi^ stu3!t«s--4ised^^ei^erlmenti^ designs to Investigate the .given to[»ic. 

The third limltatlotT of iieta-analysls Is applicable whep the set of 
primary studies is a sample *Crom a lai^er population and when multivariate 
statistics are used to analyze the ffn^lngs* Ikider such circtmistancetf, there 
must be a substantial number of primer; studies on the toplc» but there are 
no cleatly documented standards for sample sl?:es ,wh€fn doing multiple i^egres^ 
slon* Kerllnger and Peifiiazur suggest at least 30 cases for each predictor 
(1973:282)* Other well*respected statisHdaas think these suggestions 
are excessive (Coleman, 1975» Glass l977b)* tt should be aoted,. however, 
that the number of xtaaes may well be greater than the number of studies* 
because Glass suggested i^lng an *^effect^ as the unit of analysis In meta-* 
analysis and each study may have^more than oute **effect.*' An effect Is de^ 
fined as any analysis vithtn a study of a given treatment and outcoote at 
a given time of measuring the. outcome* 

A fourth problem when doln^ meta-analyses is deciding :$ihether or not a 
set of studies on 'a topic ought to be considered a universe or a sample. This 
has a bearing on whether tests of statistical ^gniflcance ar^ appropriate, 
and the ntittber of cases needed to use various, statistical tools appropriately* 
Some sets are obviously samples, 6uch -as when a random sample bf artiCljes is 
drawn from a specified sampling frame or wh^n a convenience sample Is assem- * 
bled (the latter does not meet the assumptions of Inferential' statistics)* 
tVhen t^e set Is a result of a thorough Search, the^matter is not so clear* 
Firsts It is quite likely that even a thorough search will miss some^^f not 
many, of the unpublished studies* Second, even if the E?ear(th was Successful 
ln*locatlng virtually all of the completed studies on the topic, these studies 



might ^be coudldered only a sample of the^ phencmena being studied or a san^^le 
o£ all pofislble sihidles'^n thc^ t6ptc. GlaaSu-lnlcJl^ly suggeated 'that the 
'located studies be considered !^ popul/Ltton (1977b} » but h« subsequeatly 
has treated Ahem as samples (i977a, 19?7c)* Thls^'vrlter's tentative opinion 
is that the set of etudies should usually be considered a aample because the 
^analysis o£«sn integrative reviev is Itaually intended to tuke Inferences ^ 
about the phenomena Investigated in the Individual studies leather thi|n about 
studies on the phenomena* ^ 

A fifth problem vben^ doing meta-*analysls is the lack of common netrics 
for the measures used and reported In the various primary studies on the 
topic* There are at liEast threik ^pectqp^of this' problem* First, different 
constructs sre ■aometiaes studied uadet e single topic* Fox instsnce, the 
outcomes of various studies on tfte effects of psychotherapy include einotional 
hctalch, happiness, social relations, and others* Second , for any given con* 
struct there are alternative me;d8ures Whoste metrics may not be equlvslent* 
For instance, vhat is described ss upper middle SES on one measure may be 
described as middle SES on a second measure* Third, JtHe statistics used ^ 
to measure a relationship between ^ or more varlsbles can vsry in different 
studies* Studies may use rp^ t^ rho^ tau, >pr others* 

Glass has suggested the first aspect of the problem is often not serious 
and can be Ignored (1977d)* He. would argue that all the various outcomes 
mentioned above in the example of psychotherapy are aspects of mental heslth 
and can be lumped together for s general investigation of the- effects of 
psycho therspy* When the effects are tho^ht tp perhaps vary among different 
outcome constructs. Glass suggests including dsta that indicate major distinc- 
tions among the constructs and using it as^a predictor in multiple regression 
siialyses or as a stratifying factoj^J^ nested analyses*^ though Glass directa 
his suggestion to variations in the construct of the criterion, it is equally 
appropriate for variations in the construct of predictors^ 

The second aspect of the problem is one that past reviewers have often 
complained ^bout* When different studies use different measures of the . 
same ooostruqr, and when the measures have not been validated, there is s f 
serious question* ^bout the equivalence of the values generated by the dlf** 
ferent; measures* For some characteristics such^s age and sex,, there is 
seldom any problem, but for others'such ss self-image and social support, 
there often will be s problem for which there is no Simple solution* It 
should be noted that it is incorrect to rationalize that what variations 
exist in th«^ metric of Some variable will only jserve to reduce the strength 
of relationship between that varlabJi^e snd some second variable, snd there- 
fore can be ignored, if strong relstionships are found* 1!his would be true 
if the variation^ ^n the metric are not correlated with the second variable, 
but generally there is no assurance that this will be true* 

Glass and his students have alxe^dy completed some work that reduces 
the third aspect of the problei^* ' The^ have assembled equivalency functions 
for some statistics and developed others (Glass, 1977a, White, 1976). Some 
of these functfons are mathematical identities^ ^but others are fi;pproxima^ 
tions* To date that work has not indicated the^ conditions under which the 
approximations become poor ones* This is a fertile subject for future 
research* 



A sixth problem faced In meta*analysls^ Is that o£ achieving valid and 
reliable coding o£ the' characteristics o£ the primary studies chat are to 
be^aoalyzed* This problem includes the previously dlscossed one, but 
extends beyond It.' When the set o£ reviewed studies Is relatively small, 
the coding Is likely to be done by a single Investigator. But Coding, say, 
40 studies, may require. as many as 60 to 80 hours; aod this work ntay be 
stretched over a 4* to 6-week period, thus raising serious threats to cod- 
ing stability* When large numbers o£ studies are being reviewed, a nuntber 
of coders may be used» which raises the additional problem of lnter*coder 
reliability. FaiJ^tores o£ memory, boredomt and migraine headaches can under^ 
mine sustained coding reliability. When the coding is done over a lengthy ' 
period o£ time, inter-coder rellablllcy should be a^fie^sed more than once; 
rellablll^ ever time should also be assessed; and periodic retraining nay 
be needed* 

A seventh problem £aced in joieta-analysis is how to control for the 
effects a£ - p o or p eseg r ^Ji^^efrl^-or- exe cu - Hon among th ^ revteved: studies* 
Glass (1976b) provocatively argued that, it is wasteful to discard poorly 
designed studies from the analysis because, **a^study with a ftalC-dozen 
design or analysis flaws may be valid . . * [and] l,t is an eotpirlcal 
question whether relatively poorly designed studies give-results signifi- 
cantly at variance with those of the besb* designed studl^es*' (p* Glas9 
suggestartestlng whether methodologlca^l charactc^rlsticsivuch as the reac^ 
ttvlty of^the outcome measure and the Internal valldlty'of the design are 
related to the dlstrlbutl on o( findings* He does not specify how this 
should be done other than by examining the covariation between the design 
chaVacterlstlcs an^I'^he findings. The appropriate >e^» however, is not 
quite as straightforward as it may appear* . 

The relation can be examined in either a* regression analysis or 
analysis of varlanc^v Either model can yield misleading results under 
certain circumstances. Both Models will be inadequate if there is not 
at least a modest number of studies with good overall internal and ex^ 
temal validity* Sfnce there is usually no reason to think that the 
relationship between validity and the findings is linear or. monotonlc, 
there is no basis for extrapolating from the relation that holds for studies 
of poor and mediocre validity to the studies with good validity* In Inte- 
gratlver rel^sv9 o£^ some topics, there may be very few if any pFtlmary studies 
with good inter nal a nd external validity, and thus in these cases the pos-^ 
sible effects of les^ than good validity cannot be assessed* I£ there is 
a modest number^^^fH^i^^^ studies with good validity but a much larger 
nuiiber of studies 0[lth pgor or mediocre validity, regressi on analyses of 
the full sample of 'cases" ^an" underestimate the effects o': validity* This 
is because regression lines are fitted so as to minimize the squared de^ , 
vlatlons of the blvarlate of multivariate points, and relatively little 
weight would be given to the small nxjmber of points from the good validity 
studies. ^ 
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Both analysis of variance and regression analysis will under- 
estloate the effects of validity If^the mean level 

of the criterion about the same for [>oor» medium and good validity 
studies, but the variance Is considerably greater for the poor and 
medium validity studies. In such a case» both types of analysis 

will correctly Indicate that varying validity does not 
affect point eatlmateg of the criterion^ but both would underestimate 
the adverse effect of relatively poor validity on any correlations with 
the criterion, tt should be nqted that x\k^ variances of the^ different 
cells do not have to be statistically significantly different for them 
^ to cause real and substantial underestimates of the effects of varying 

validity. 

Elarller In thlft report It was Indicated the congruence of findings 
does not assufe their validity^ and the lack of congruence Is not proof 
of Invalidity. When there Is strong congruence In the findings ^nd no 
good evidence of a st'rong common threat to the validity o^ all or almost^ 
, all of the studies^ there Is suggestive^ (but not conclusive) » evlden^e^'^'^ 
that any methodological weakness that eklsted In some of th^ studies 
probably did not have a substantial effect on the findings of chose 
studies. But in the more comnion situation when thf^re are some apparent 
Incongruences In the findings^ It Is Important to have at least ^ 
modest numbejmaf studies In a sample that are Judged to have good overall 
Internal and external validity^ If one Is to empirically assttss whether 
the methodological weaknesses that vary over the studies may have 
affected the findings. * 

^ ^ 

iniat constitutes good enough overall Internal and external validity 
for these purposes cannot be simply ejcpllcated. It probably depends on^ 
^ a number of factors^ and needs further thought. It Is Imjiortant^ however^ 
to remember that threats of validity can be controlled by means other than 
design^ and that some of the threats are likely to be trivial In any given 
study. This was discussed on page 10 of this paper. N 

Glass (1977b) has suggested that if the quality of a study Is found 
to be related tb^che findings^ a greater stake ^'should be put In Che better 
/ designed study." This mlRht be done by some sort of weightings nesting 

analyses within subsets of the good and poor quality studies with more 
reliance put on the results of the former » or by disregarding the poor 
ones. 

This discussion has very briefly outlined the major advantages of 
the use of meta-analysis for Integrative reviews^ and Jias presented^ 
In considerably more detail^ some difficulties with the approach. The^^*^ 
disproportionate attention given to the difficulties should not mis- 
lead the reader Into thinking that meta-analysis has more disadvantages 

J 
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than adviincages^ or has more disadvantages' t^po other analytical approach- 
es when doing Integrative reviews. In the opinion of this writer^ the 
apiSroach Is meLhodologlcally sounder than most currently used approaches. 
Though It does have some serious difficulties^ most of these difficulties ^ 
are common to the other approaches. Also» the other approaches have 
additional difficulties or llralLatlons which are not true of the meta- 
analysis. 

In shorty the meta-analytic approach is an Important contribution 
\ to social science methodolpgy. It Is not a panacea* but It will often 
prove to be quite valuable when applied and Interpreted with care. 

Task f>i Interpreting the Results (V67. 68^ 69, 70, 71) 

Seveif of the 36 randomly selex;ted reviews Induced and reported new 
theory^ conftr:narlon of old thepry* or disconf Irmatlon of old theory 
(V67); 6 <ff the 36 Induced and stated recommendations for policy or 
practice* and four of those six discussed conditions which might affect 
' the impact of the policies or practices (V68 & 69); 28 of the 36 Suggested 
^ desirable foci or methods for future primary or secondary s^tudles on the 
topic (V70); and only 3 of the 36 suggested desirable foci oT methods for 
future review s on the topic or related -topics (V7i)^ 

* 

There are other types of conclusions that the review articles may 
have stat«^d'that were not coded^ but it is surprising that fewer than 
half made conclusions about either theory^ policy or practice. It may 
be ^hat most integrative reviews are oriented towards madlng suggestions 
for improving tfie primary research, or it may be that most^tart with the 
aim of making suggestions for theory, policy or practic^^,^«it subsc<juently 
decide to withhold Inferences because they judge the av^^^able evidence to 
be inconclusive. The letter reason seems unlikely s'lac^ the studied reviews 
usually did report one or more inferences about the topic. Of the 26 
reviews that drew at least one inference of inconclusive evidence, 24'of ^ 
those alsQ drew at least one Inference That an investigated condition or 
relation does exist either generally or for a specified subset' of the 
- population or of the investigated situations (V69a*d). (Multiple inferences 
were drawn in mos't of the revieiiTs because t!iey had multiple sirb^toplcs 
that were, investigated.) 

This writer has no strong suspfclons as to why iQost review articles 
do not make suggestions for future "reviews. Perhaps it is because 
reviewers do not think carefully about the methods of dolrtg a review; 
perhaps it is because after completing the often herculean task of doing 
a review, the reviewers would not want to wish the task on aAyone else; 
or pei^ps it is for some other reason. Regardless, it is quite apparent 
that the resulting omission -is unnecessary and haroful to the projr^ss of 
science. As with primary research, it is virtually impcssibls to do a 
major review carefully without encountering so^e ideas for Inproved 



methods arvd some additional questions that need to be answered but 
caooot be answered In the glv^n review. Tticse Ideas ^nd questions 
can be a valuable contribution to other Investigators and ought to 
be reported^ even If they only can b^ used after the accunuiatlon of 
further primary research. 

Task 7: Reporting the Review Vl7j_l3^ 17, 18, A^a*d ) 

*^ " ..." 

A widely hold precept "in all the sciences is thit reports of research 
ought to include enough information about the study that, the reader can ^ 
second guess the author^s inferences, this precept prpb^ly also, ought, 
to apply to integrative reviews^ since such reviews are a foriD of research 
As a ninimum^ it is widely held that the report ought to at least describe 
the samplings aeasurcmentt analyses^ and the 'findings. Where unusual 
procedures hav^ been used» It is expected that they will be described in 
socac d^ail. ^ 

Some of the previously discussed results indicate that few o^ the 36 
review articles repotted certain methodological aspects of the review. 
Only one of the 36 article^ reported whether or not It jised^-ndcxes and 
ioforaatilon retrieval systems to search for primary studies on the topic 
(V16); 3 of the 36 reported whether or not they used bibliographies as a ^ 
means of lo^^^i^ing primary studies CV19); 7 af the 36 indicated whether or 
not they analyzed the full set of located studies on the topic Instead of 
some Subset CVAS>; and only one of the 36 indicated whether or not needed 
information that was missing in the reports of the primary studies was 
Sought from other sources (V60). 

A number of other aspect^^->f a review that might be reported were 
coded. Thirty of the 36 articles e^^licitly stated the topic being 
reviewed CV7); 12 of the 27 articles that cited previous reviews indicated 
how-thcir review was to differ from pre^rious ones (^13); the one article 
that had indicated that it us^ed indexes and infon^ation retrieval systems 
also indicated the beginning and ending dates and the descriptors that 
were used CV17 and 18). ' / • 

Twenty-eight of the 36 review articles often reported the findings 
of an individual study or indicated how many or what percentage of the 
studies had each type of finding or result CVAAa); three of the 36 
articles often cited the rangc^ mean or other summary indicator of the 
findings of the studies (VAAb); and half of the 36 reviews often just 
reported a generalization followed by the citation of several studies 
(V4Ac)i three approaches were coded independently and were hot mutually 
exclusive- 

These results^ taken together^ ii^dicate that integrative review 
articles comoonly fail to report their studies In the detail that is 
fairly conacion for primary research articles. A number' of i!nportant 
functions are served by reporting various aspects of the review. 



There arc wo rcisoot; for carefully reporting the 
literature e^eacch proce$a 10 att integrative review article^ First 
It helps the reader to Judge the ccnnpreheMlveness and representa-^ 
tlveness of the sources that are the subject of the review* Just as 
tbe sample In a primary study can critically influence the findings 
91 the stud/^ the selection of the privacy and secondary studies that 
are included in a review can seriously affect the results of the review. 
The bibliography of a review atticle indicates what individual studies 
were Included In the review^ but It does not Indicate what broad classes 
of possibly relevant studies were excluded* A person with a thorbugh 
knowledge of the research 00 the' topic will be ablQ to iafer such 
oalsslons by carefully examining the bibliography^ but persons with less ^ 
thorough knowledge of the topic will not be able to do so. Secondly^ 
briefly detailing the literature search process in a review article 
allows future reviewers of the topic to extend easily rhe review w*^lthout 
duplicating it. If it is known that tnost of the articles ifU:luded £n the 
review were those listed under certain descriptors of certain years of 
certain indexes, or found in the bibliographies of specified sources^ it 
i0 very easy for a subsequent reviewer to broaden or deepen the search 
for teicvant sources without dupllcacing the earlier work* 

If soiDe located primary studies were exluded from the analysl^^ the 
eanner l7t which this w^s done ought to be reported. Likewise^ how 
fiissing data are handled should be reported. An explicit scatcmettt of 
the topic being reviewed and an indication of how the reported review 
was to dlffc. from previous ones on N{:he topic often helps orient the 
reader and prevent pislntcrpretations. 

When the number of reviewed studies is less than about forty^ it. is 
usually very easy to present a single page table indicating a number of 
the investigated cliaracterlstlcs of the primary stuSlcs including their 
findings^ stated in either standardized or unstandardlzed fora^ or both> 
and with the direction and statistical slgnVllcance indicated. Such 
Infonnation would allow any reader to reanalyze the studies and second 
guess the review^- *s analysis. Such opportunity is always a little 
threatening^ but one of the oldest conventions of the scientific 
community is ctaklng one^s data available to other scholars after ooc 
has had a chance to analyse and publish reports of it. When the number 
of primary studies is quite large^ it is not practical to include all the 
data in the publli;;hed report, but It should be available upon request 
(with adequate docum£n;ation)« 

The practice of reporting a gencrail^zation followed by the citation 
of Several studies (V^^c) was often used by half of the 36 revlsws despite 
the fact that it can be very misleading. I'nless used in conjunction with 
one of the ether two approached (Vn^a op V44b)^thls practice provided the 
reader with no way of critically examining tlic Inferences of the reviewer 
unless he or she consults the full set of studies on the topic. For 
instance^ Dusck (1973) reported there IS' considerable evidence that 

during classroom interactions teachers troat groups of students differently 
(e.g., Davidson, 19721 Good ^ Brophy, 1970; SchwcbeL & Chcrlin* 1972)" 



<p. 662). Hoffoan (1972) reported^ "Several Inve&tlgatorr report that 
vhile dependency in bo^s is dlficouraged by parents^ teacherfi » peers^ 
and the masfi medla^ It is more acceptable in girls (Kagan & Hosfi^ 1962; 
Kag«a» 196A; Sears, Rau, S Alpert^ 196S>*'(p. lAA). Both of these 
stateoents give an lopllcatlon of cooscosus among the available research 
evidence* but neither of Che statetceats uouId.be Incorrect even If the 
oajoricy of the located evidence contradicted their points. Jacoby okade 
« similar type of statement^ reporting "Not surprifiiogly studies which 
utilised price as the only independent variable (261» 262* 316 » 4^2) 
generally found a significant main Effect../* (p. 336). Though the 
^generally" in this statement provides an explicit warning that the 
evidence was not entirely consistent^ it still does not indicate wiiat 
percentage of the studies supported the. finding^ nor does it indicate 
the magnitude of the findings. Sooe reviewers stated Juxtaposed 
gcnerdlizaUons such as* **Scverdl studies found thAt X causes Y (Acc^ 
1967; Bace, Cace, 1969; Dace, 1970J, but a few did not (Ease, 1968, 

Face, 1^70)/' This type of presentation is less .ambiguous than the above 
examples, but this writer^s impression is that it was not prevalent* 



^ A n\jnber eft, characteristics of the primary research that might have 

affected the laethodological approaches used in each review were cod^d* 
TV^ese characteristics were as fallows: whether the primary research was 
psychological, ociological^ or about education (V2); whether the topic 
of ,thc , review was aboui! a condition, association^ or causal relation (V29 
a-c); the types of construct investigated in^the prttaary research (V30a-*f); 

^ the prjKdominant re^iearch orientation of ^the primary research (V34 ^nd 72); 

the percentage of primary studies tha^ investigated at least one statisti* 

cal interaction effect (V36) ; and a crude estiiaate of when research was 
first done on the given topic (V37b)» 

Host of t^e characteristics of the primary research were cross^tabu- 
lAted with done of tho variables indicating the different methodological 
approaches of tho reviews* the variables that were excluded fro:? those 
analyses were those whose distribution in their original form or a con-, 
ceptually reafionabie transformation would frequently have resulted in 
expected cc?ll cizos that would \n^ke a Chi^square test of the cross-tabula- 
tion invalid. Thus V2, 75, 76, ?7, 78 and^78 were each cross-tabulated 
vlth each VH. 41, 4iia, ^4c, S^a, 56c. 

Only one of these 36 cross-tabulations had a Chi*square statistically 
eignificant at the 0.05 le^el or less, buc che Chi-squani for this cross* ' 
tabulation was inv.iiid because more rhan 20 percent of the cells had expect- 
ed values of le^s than 5. In addlcion, when testing 36 hypotlies^s at the 
0.05 level, one or two false rejections of tho null hypioti^esili can be 
expected froa chance if the hypothesis is true. Consequ^tly, the analysis 
failed to discover reliable evidence of a relationship between any of the 
examined characteristics of the reviewed research and ^ny of the^xariined 
approaches to the netbodological tasks of an integrative reviev. TMs. of . 
course, should not be interpreted as Indicating that there ato no nuch rvlj- 
tlons^hlpKi but only that given the soali sample sl2C ar^d the skcwe#cli$tri- 
butions of soDi^ of the variables of interest, no reliable infcti^nces 
CDuId be c^.ido. 
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Ti vio-iir. he loc.itvd. TIk^ (in.iiysl^; of those rej,oliidert* wms> n^*: 

Tt-Tiibly cnitj-^htcilng ^nd ^^iU not he discusf;c<| in thlt* paper* 

The cch :ors of five prosi tlgioiis fiOci/j3 iicicncc jouro'^ls thkt frt*- 
pwbllw^^Jt integrative reviews vtre abfced about the evefluatioa 
:rii*'rli and st^indacds th,ic iihey u^ic to decide witether or not to publi^ih 
^^ib^itted Untejsrac ive revlcwa.. One *;dItor provided printed guidelines 
;h,it he provides to' prospecrive reviewers and his cdlt(>ri:il .i^jisi^tonns^ 
i*di toi f.'iilcd to reply to repeated follow^ip^*, And cbi'ce ciiitt^rs s.iid 
iv^nti^lly tJiat they rely on the proJe^^sional Judg(?^entQ of their edi- 
tcii i^il <i:i&is tanm \i*urhotr* of invi ted review^. 

TJk: o^.t icx^ls i?i ton orp^<ini7:at iu(v> t^t:i r , wtirt: though t to fi,iva tn<iJor 
^ e .pun", i hM 1 1 ico for In tci'.ra r i^c reviti-ws Ln thet iioctrtl ^ btoloi^i cai 
phy>iicii; bciiMiLU'*i were asked -ibout : 1) rho forri.il or infonniiJl ?,uMei int^^* 
.*r i^t^intiards Ltbed by their ofllcc»i to fcicilnate high-quality rcviev*^ ot 
^.et^^ of empirical reseiirch studXeH, 2) th^ ev^lu^itive critcri^i used to^ 
tiie qu^iliiy such reviews, and 3> ex^iraple?* af tiuch r**vieys thxit 
th'^y rf^n^ider !o be of unusually Mgh qutiHry. T^ierc were b'isic:iliy ihrei^^ 
^yp^ of responses?- A couple of rhe respond<*nt ^ reporlrcd sor^c jvuldelines 
or 'jvUu.itive st.indiirt!:;, but they generally were not very specific. Sone 
of ihc re»pond<^ntJi Indicated ihat th^y rejy iJlT»obt exclusively on the 
juit^,.rjcnt nf tijft sclent li^^t^i who the^ hove do thoir reviews. And t^o of 
the respondents thought th^it intej^rative reviews were not often done in 
their di^ciplifjd;! {fsath, phyt.ics an<3 ^.p^icc :»cienccs}| though sub:i<?(iot-n: ly 
it va^^ di^covc^rcd by thi% -luthDr th.it the Rcyietfi^ of ..\*p.de rp ^Ph^-^^^^ itt^ 



ConcltiS lt>n 

U .tprr *r^* ^h.jt rel.ui-cly llttK* thoogt? h biwn glvi-n to the rr^ethot 
i : iji>in/, lnri*ftr^r ive rcvie-*,w 1: is clc^ir th.n such rovie^^;^ ore loporrar 

-^iiWnre -i^id social pul Icy-mifc in*: ^md that many i nsf ^jigra t ive revlewi* are 
iUi:>^. le^*'. rigi'Tously th.m is currc^ntly pos?=vible. It see^s Ufcely that sou 

ih. ijnfrjiijon that syrrbun^s !r>ny topic:- in the eociai sc£enc^;s is part 
^ rist;3r f>f unrti^otuus rcviei-''^ of re^iearcn on th^* topic* 

n^j*. ^iv^ iiud fr^iLnr v;.^rk of Conf^ Cl.iv^^^ provide srverol Sde^iri ^ 
:<-: (ov::u: tJ^^^ pr^^vjijuu^ mnhoj.^ for rt^vii^us--; s^>nf' of ihn ide^is shf->\jldT> 
ht^ roiv\i<ivTCf] ti*;IirUt FLithcr ^ thi'^r^s* ir/ need tor fici'cntlsti 

•^h ^ %io u intciuarivc tcv^i^u:, cr^o^^ider the c*t>rits of the tdca*j> to 
-]^'r/^ - r*^ iVmj? rhf^ pr?t>Ir^v tc> vhirh tiiey are direcvd* rn try new oppr^ 
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